118 research outputs found

    And now for something completely different: running Lisp on GPUs

    Get PDF
    The internal parallelism of compute resources increases permanently, and graphics processing units (GPUs) and other accelerators have been gaining importance in many domains. Researchers from life science, bioinformatics or artificial intelligence, for example, use GPUs to accelerate their computations. However, languages typically used in some of these disciplines often do not benefit from the technical developments because they cannot be executed natively on GPUs. Instead existing programs must be rewritten in other, less dynamic programming languages. On the other hand, the gap in programming features between accelerators and common CPUs shrinks permanently. Since accelerators are becoming more competitive with regard to general computations, they will not be mere special-purpose processors in the future. It is a valid assumption that future GPU generations can be used in a similar or even the same way as CPUs and that compilers or interpreters will be needed for a wider range of computer languages. We present CuLi, an interactive Lisp interpreter, that performs all computations on a CUDA-capable GPU. The host system is needed only for the input and the output. At the moment, Lisp programs running on CPUs outperform Lisp programs on GPUs, but we present trends indicating that this might change in the future. Our study gives an outlook on the possibility of running Lisp programs or other dynamic programming languages on next-generation accelerators

    Pure functions in C: A small keyword for automatic parallelization

    Get PDF
    © 2017 IEEE. The need for parallel task execution has been steadily growing in recent years since manufacturers mainly improve processor performance by scaling the number of installed cores instead of the frequency of processors. To make use of this potential, an essential technique to increase the parallelism of a program is to parallelize loops. However, a main restriction of available tools for automatic loop parallelization is that the loops often have to be 'polyhedral' and that it is, e.g., not allowed to call functions from within the loops.In this paper, we present a seemingly simple extension to the C programming language which marks functions without side-effects. These functions can then basically be ignored when checking the parallelization opportunities for polyhedral loops. We extended the GCC compiler toolchain accordingly and evaluated several real-world applications showing that our extension helps to identify additional parallelization chances and, thus, to significantly enhance the performance of applications

    Simurgh: a fully decentralized and secure NVMM user space file system

    Get PDF
    The availability of non-volatile main memory (NVMM) has started a new era for storage systems and NVMM specific file systems can support extremely high data and metadata rates, which are required by many HPC and data-intensive applications. Scaling metadata performance within NVMM file systems is nevertheless often restricted by the Linux kernel storage stack, while simply moving metadata management to the user space can compromise security or flexibility. This paper introduces Simurgh, a hardware-assisted user space file system with decentralized metadata management that allows secure metadata updates from within user space. Simurgh guarantees consistency, durability, and ordering of updates without sacrificing scalability. Security is enforced by only allowing NVMM access from protected user space functions, which can be implemented through two proposed instructions. Comparisons with other NVMM file systems show that Simurgh improves metadata performance up to 18x and application performance up to 89% compared to the second-fastest file system.This work has been supported by the European Comission’s BigStorage project H2020-MSCA-ITN2014-642963. It is also supported by the Big Data in Atmospheric Physics (BINARY) project, funded by the Carl Zeiss Foundation under Grant No.: P2018-02-003.Peer ReviewedPostprint (author's final draft

    Smart Grid-aware scheduling in data centres

    Get PDF
    © 2016 In several countries the expansion and establishment of renewable energies result in widely scattered and often weather-dependent energy production, decoupled from energy demand. Large, fossil-fuelled power plants are gradually replaced by many small power stations that transform wind, solar and water power into electrical power. This leads to changes in the historically evolved power grid that favours top-down energy distribution from a backbone of large power plants to widespread consumers. Now, with the increase of energy production in lower layers of the grid, there is also a bottom-up flow of the grid infrastructure compromising its stability. In order to locally adapt the energy demand to the production, some countries have started to establish Smart Grids to incentivise customers to consume energy when it is generated. This paper investigates how data centres can benefit from variable energy prices in Smart Grids. In view of their low average utilisation, data centre providers can schedule the workload dependent on the energy price. We consider a scenario for a data centre in Paderborn, Germany, hosting a large share of interruptible and migratable computing jobs. We suggest and compare two scheduling strategies for minimising energy costs. The first one merely uses current values from the Smart Meter to place the jobs, while the other one also estimates the future energy price in the grid based on weather forecasts. In spite of the complexity of the prediction problem and the inaccuracy of the weather data, both strategies perform well and have a strong positive effect on the utilisation of renewable energy and on the reduction of energy costs. This work improves and extends the paper of the same title published on the SustainIT conference (Mäsker et al., 2015). While that paper puts more emphasis on the utilisation of green energy, the new algorithms find a better balance between energy costs and turnaround time. We slightly alter the scenario using a more realistic multi-queue batch system and improve the scheduling algorithms which can be tuned to prioritise turnaround time or green energy utilisation

    Hyperion: Building the largest in-memory search tree

    Get PDF
    Indexes are essential in data management systems to increase the speed of data retrievals. Widespread data structures to provide fast and memory-efficient indexes are prefix tries. Implementations like Judy, ART, or HOT optimize their internal alignments for cache and vector unit efficiency. While these measures usually improve the performance substantially, they can have a negative impact on memory efficiency. In this paper we present Hyperion, a trie-based main-memory key-value store achieving extreme space efficiency. In contrast to other data structures, Hyperion does not depend on CPU vector units, but scans the data structure linearly. Combined with a custom memory allocator, Hyperion accomplishes a remarkable data density while achieving a competitive point query and an exceptional range query performance. Hyperion can significantly reduce the index memory footprint, while being at least two times better concerning the performance to memory ratio compared to the best implemented alternative strategies for randomized string data sets

    GekkoFS: A temporary distributed file system for HPC applications

    Get PDF
    We present GekkoFS, a temporary, highly-scalable burst buffer file system which has been specifically optimized for new access patterns of data-intensive High-Performance Computing (HPC) applications. The file system provides relaxed POSIX semantics, only offering features which are actually required by most (not all) applications. It is able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of general-purpose parallel file systems.The work has been funded by the German Research Foundation (DFG) through the ADA-FS project as part of the Priority Programme 1648. It is also supported by the Spanish Ministry of Science and Innovation (TIN2015–65316), the Generalitat de Catalunya (2014–SGR–1051), as well as the European Union’s Horizon 2020 Research and Innovation Programme (NEXTGenIO, 671951) and the European Comission’s BigStorage project (H2020-MSCA-ITN-2014-642963). This research was conducted using the supercomputer MOGON II and services offered by the Johannes Gutenberg University Mainz.Peer ReviewedPostprint (author's final draft

    Secure genome processing in public cloud and HPC environments

    Get PDF
    Aligning next generation sequencing data requires significant compute resources. HPC and cloud systems can provide sufficient compute capacity, but do not offer the required data security guarantees. HPC environments are typically designed for many groups of trusted users and often only include minimal security enforcement, while Cloud environments are mostly under the control of untrusted entities and companies. In this work we present a scalable pipeline approach that enables the use of public Cloud and HPC environments, while improving the patients’ privacy. The applied techniques include adding noisy data, cryptography, and a MapReduce program for the parallel processing of data

    Deduplication potential of HPC applications' checkpoints

    Get PDF
    © 2016 IEEE. HPC systems contain an increasing number of components, decreasing the mean time between failures. Checkpoint mechanisms help to overcome such failures for long-running applications. A viable solution to remove the resulting pressure from the I/O backends is to deduplicate the checkpoints. However, there is little knowledge about the potential to save I/Os for HPC applications by using deduplication within the checkpointing process. In this paper, we perform a broad study about the deduplication behavior of HPC application checkpointing and its impact on system design

    GekkoFS: A temporary burst buffer file system for HPC applications

    Get PDF
    Many scientific fields increasingly use high-performance computing (HPC) to process and analyze massive amounts of experimental data while storage systems in today’s HPC environments have to cope with new access patterns. These patterns include many metadata operations, small I/O requests, or randomized file I/O, while general-purpose parallel file systems have been optimized for sequential shared access to large files. Burst buffer file systems create a separate file system that applications can use to store temporary data. They aggregate node-local storage available within the compute nodes or use dedicated SSD clusters and offer a peak bandwidth higher than that of the backend parallel file system without interfering with it. However, burst buffer file systems typically offer many features that a scientific application, running in isolation for a limited amount of time, does not require. We present GekkoFS, a temporary, highly-scalable file system which has been specifically optimized for the aforementioned use cases. GekkoFS provides relaxed POSIX semantics which only offers features which are actually required by most (not all) applications. GekkoFS is, therefore, able to provide scalable I/O performance and reaches millions of metadata operations already for a small number of nodes, significantly outperforming the capabilities of common parallel file systems.Peer ReviewedPostprint (author's final draft

    Zeroing memory deallocator to reduce checkpoint sizes in virtualized HPC environments

    Get PDF
    Virtualization has become an indispensable tool in data centers and cloud environments to flexibly assign virtual machines (VMs) to resources. Virtualization also becomes more and more attractive for high-performance computing (HPC). This is mainly due to the strong isolation of VMs which enables: (1) the sharing of cluster nodes and optimization of the system’s overall utilization; (2) load balancing by means of migrations due to the reduction of residual dependencies; and (3) the creation of system-level checkpoints increasing the fault tolerance in an application-transparent way. On the downside, the additional virtualization layer conceals information that is only available on the process level. This information has a direct influence on the checkpoint size which should be kept as small as possible. In this paper, we propose a novel technique for checkpoint size reduction in virtualized environments. We exploit the fact that the hypervisor detects zero pages which are omitted when capturing a checkpoint. Moreover, compression techniques are applied for a further reduction of the checkpoint size. We therefore fill freed memory regions with zeros supporting both the zero-page detection and the compression. We evaluate our approach by taking the example of HPC applications. The results reveal a reduction of the checkpoint size by up to 9% when compression is disabled in the hypervisor and up to 49% with compression enabled. Furthermore, memory zeroing is able to reduce VM migration time by up to 10% when compression is disabled and by up to 60% when compression is enabled
    • …
    corecore